diff --git "a/wandb/run-20220326_171130-bdf5nvyg/files/output.log" "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" new file mode 100644--- /dev/null +++ "b/wandb/run-20220326_171130-bdf5nvyg/files/output.log" @@ -0,0 +1,6146 @@ + + 0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:34,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:35,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:36,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:37,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:38,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:39,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:40,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:41,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:41,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:43,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:43,655 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:44,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:45,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:46,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:47,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:48,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:49,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:50,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:52,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:52,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:54,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:54,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:55,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:56,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:57,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:11:58,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:11:59,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:00,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:01,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:02,220 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/2230 [00:30<18:41:39, 30.19s/it] + 0%| | 1/2230 [00:30<18:41:39, 30.19s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:03,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:04,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:05,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:05,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:06,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:07,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:08,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:09,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:10,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:11,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:12,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:12,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:13,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:14,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:15,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:16,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:17,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:18,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:19,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:19,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:21,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:21,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:22,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:23,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:24,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:26,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:27,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:28,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:28,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:29,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:30,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 2/2230 [00:58<18:01:35, 29.13s/it] + 0%| | 2/2230 [00:58<18:01:35, 29.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:31,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:32,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:33,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:34,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:35,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:35,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:37,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:37,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:38,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:39,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:40,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:41,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:42,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:42,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:43,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:44,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:45,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:46,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:47,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:48,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:49,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:49,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:50,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:51,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:52,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:53,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:54,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:55,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:56,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:56,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.6797, 'learning_rate': 1.2e-06, 'epoch': 0.01} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:12:57,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:12:58,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 3/2230 [01:26<17:42:13, 28.62s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:12:59,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:00,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:01,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:02,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:03,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:03,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:04,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:05,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:06,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:07,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:08,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:08,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:10,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:10,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:11,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:12,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:13,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:14,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:15,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:15,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:16,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:17,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:18,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:19,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:20,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:20,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:22,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:22,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:24,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:25,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:26,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it] + 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:13:27,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 4/2230 [01:54<17:25:37, 28.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:30,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:34,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:34,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:37,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:37,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:40,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:44,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:44,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:47,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:47,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:50,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:50,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:27,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 5/2230 [02:21<17:11:18, 27.81s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 5/2230 [02:21<17:11:18, 27.81s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:57,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:13:57,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:01,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:04,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:04,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:07,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:11,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:11,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:14,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:14,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:17,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it] Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it] Setting `use_cache=False`...1] 2022-03-26 17:13:54,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 6/2230 [02:48<17:00:49, 27.54s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:24,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:24,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:29,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:29,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:32,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:32,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:35,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:39,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:39,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:42,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:42,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:45,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:21,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 7/2230 [03:16<17:05:31, 27.68s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▏ | 7/2230 [03:16<17:05:31, 27.68s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 8.2504, 'learning_rate': 3.6e-06, 'epoch': 0.03} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:52,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:56,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:56,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:59,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:14:59,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:02,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:06,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:06,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:09,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:09,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:12,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:12,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:14:49,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▎ | 8/2230 [03:43<16:55:52, 27.43s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%|▎ | 8/2230 [03:43<16:55:52, 27.43s/it][WARNING|modeling_bart.py:1051] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:19,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:22,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:22,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:26,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:26,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:29,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.6117, 'learning_rate': 4.8e-06, 'epoch': 0.04} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.1625, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.04} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.035, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.8002, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.5852, 'learning_rate': 7.2e-06, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.4332, 'learning_rate': 7.799999999999998e-06, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.2015, 'learning_rate': 8.4e-06, 'epoch': 0.07} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.037, 'learning_rate': 8.999999999999999e-06, 'epoch': 0.07} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.9392, 'learning_rate': 9.6e-06, 'epoch': 0.08} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7954, 'learning_rate': 1.02e-05, 'epoch': 0.08} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6019, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.09} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6175, 'learning_rate': 1.14e-05, 'epoch': 0.09} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.481, 'learning_rate': 1.1999999999999999e-05, 'epoch': 0.09} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4746, 'learning_rate': 1.26e-05, 'epoch': 0.1} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:15:32,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3499, 'learning_rate': 1.3199999999999997e-05, 'epoch': 0.11} + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2045, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.11} + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [10:05<15:04:54, 24.60s/it] Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2217, 'learning_rate': 1.44e-05, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0872, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1305, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.13} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:22:41,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:23:48,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:23:48,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:23:53,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9678, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.13} + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 29/2230 [12:26<14:19:49, 23.44s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:24:15,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0943, 'learning_rate': 1.68e-05, 'epoch': 0.13} + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [12:49<14:09:24, 23.17s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0962, 'learning_rate': 1.74e-05, 'epoch': 0.14} + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 31/2230 [13:11<14:00:47, 22.94s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:24:58,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:24:58,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:02,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:02,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9747, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.14} +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:06,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:23,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:23,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9798, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.15} +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:27,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:39,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:49,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:49,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9415, 'learning_rate': 1.92e-05, 'epoch': 0.15} +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:25:53,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:03,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:22,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9001, 'learning_rate': 2.04e-05, 'epoch': 0.16} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:30,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:36,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:42,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:50,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:26:50,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.995, 'learning_rate': 2.1e-05, 'epoch': 0.17} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:26:55,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:03,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:03,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:07,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:07,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 38/2230 [15:39<12:36:33, 20.71s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 38/2230 [15:39<12:36:33, 20.71s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:13,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:19,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:24,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:29,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:29,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9133, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.17} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:34,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:36,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:41,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:44,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:46,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▍ | 40/2230 [16:15<11:51:56, 19.51s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▍ | 40/2230 [16:15<11:51:56, 19.51s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:50,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:52,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:54,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:27:54,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:27:58,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:00,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:02,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:04,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:04,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:06,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:08,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:10,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:12,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:14,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:16,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:18,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:18,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:20,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:22,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:23,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:25,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:27,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:31,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:32,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:34,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:34,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:36,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:38,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:39,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:43,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:44,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:47,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:48,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:48,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:50,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:53,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:55,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:56,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:28:58,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:02,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:05,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:06,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:09,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:10,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:10,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:13,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:15,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:17,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:19,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:21,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:21,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:23,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:25,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:27,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:29,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:29,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:31,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:33,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:36,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:39,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:41,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:43,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:43,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6317, 'learning_rate': 2.88e-05, 'epoch': 0.22} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:47,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:47,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:51,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:51,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:54,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:58,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:29:58,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:01,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:01,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:05,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:05,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:09,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:12,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:16,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:16,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:19,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:19,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:26,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:26,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:30,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:30,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:33,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:40,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:40,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.0773, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.23} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:44,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:47,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:47,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:50,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:50,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:54,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:57,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:30:57,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:01,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:01,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:04,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:08,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:08,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8237, 'learning_rate': 3.06e-05, 'epoch': 0.24} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:11,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:11,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:14,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:18,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:18,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:21,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:21,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:25,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:28,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:28,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:31,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:35,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:35,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4014, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.24} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0832, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.25} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:31:38,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8432, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.25} + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8566, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.26} + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8108, 'learning_rate': 3.36e-05, 'epoch': 0.26} + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7935, 'learning_rate': 3.42e-05, 'epoch': 0.26} + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8346, 'learning_rate': 3.48e-05, 'epoch': 0.27} + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 56/2230 [20:57<15:10:35, 25.13s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8708, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.27} + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7303, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.28} + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8369, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.28} + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 61/2230 [23:11<15:45:05, 26.14s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7123, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.29} + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6835, 'learning_rate': 3.78e-05, 'epoch': 0.29} + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7015, 'learning_rate': 3.84e-05, 'epoch': 0.3} + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6597, 'learning_rate': 3.9e-05, 'epoch': 0.3} + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 64/2230 [24:29<15:39:28, 26.02s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7459, 'learning_rate': 4.02e-05, 'epoch': 0.31} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6883, 'learning_rate': 4.08e-05, 'epoch': 0.31} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6348, 'learning_rate': 4.14e-05, 'epoch': 0.32} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6861, 'learning_rate': 4.2e-05, 'epoch': 0.32} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5766, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.33} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6444, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.33} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7433, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.34} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6555, 'learning_rate': 4.4399999999999995e-05, 'epoch': 0.34} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█���▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5683, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.35} + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▍ | 68/2230 [26:09<15:08:11, 25.20s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6298, 'learning_rate': 4.56e-05, 'epoch': 0.35} +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5865, 'learning_rate': 4.62e-05, 'epoch': 0.35} +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:41:42,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6905, 'learning_rate': 4.68e-05, 'epoch': 0.36} +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:42:15,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5504, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.36} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6262, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.37} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:42:49,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:43:20,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:34,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:34,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5915, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.37} +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:38,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:45,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:43:55,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6035, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.38} +[WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:44:01,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:06,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:23,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:29,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.493, 'learning_rate': 5.04e-05, 'epoch': 0.39} +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:43,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:43,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:44:47,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:44:47,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 87/2230 [33:23<12:18:14, 20.67s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 87/2230 [33:23<12:18:14, 20.67s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:44:58,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:04,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5418, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.39} +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:14,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:20,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:22,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:22,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:26,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:32,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:37,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:39,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:44,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:46,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:49,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:51,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:51,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:53,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:55,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 17:45:55,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:45:59,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:01,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:03,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:05,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:07,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:09,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:09,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:11,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:13,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:15,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:17,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:19,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:21,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:23,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:23,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:25,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:26,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:28,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:30,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:34,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:35,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:37,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:39,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:39,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:41,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:42,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:44,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:47,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:47,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:51,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:55,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:58,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:46:59,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:01,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:03,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:03,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:05,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:08,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:09,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:13,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:16,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:16,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:18,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:19,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:21,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:24,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:26,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:26,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:28,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:30,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:32,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:36,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:38,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:40,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:40,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:42,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:44,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:45,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:48,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:52,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:52,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:56,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:56,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:47:59,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:03,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:03,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:06,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:06,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:10,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:10,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:13,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:13,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:17,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:17,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:20,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:20,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:24,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:24,287 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:27,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:31,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:31,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:34,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:38,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:38,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:41,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:41,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8072, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:45,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:45,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:48,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:51,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:51,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:55,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:55,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:48:58,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:02,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:02,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:05,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:05,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:09,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:09,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:15,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:19,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:19,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:22,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:22,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:25,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:29,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:29,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:32,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:32,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:35,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:35,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:39,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:39,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:49:42,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9539, 'learning_rate': 6.18e-05, 'epoch': 0.47} + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8815, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.48} + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6952, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.48} + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7131, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.48} + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 105/2230 [38:32<14:15:38, 24.16s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6939, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.49} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6896, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.49} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6205, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.5} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6257, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.5} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.63, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.51} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5141, 'learning_rate': 6.72e-05, 'epoch': 0.51} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5986, 'learning_rate': 6.78e-05, 'epoch': 0.52} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.55, 'learning_rate': 6.84e-05, 'epoch': 0.52} + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 109/2230 [40:19<15:17:24, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5866, 'learning_rate': 6.9e-05, 'epoch': 0.52} + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 117/2230 [43:42<14:45:37, 25.15s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5122, 'learning_rate': 6.96e-05, 'epoch': 0.53} + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5169, 'learning_rate': 7.02e-05, 'epoch': 0.53} + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████ | 118/2230 [44:06<14:38:59, 24.97s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5464, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.54} + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 120/2230 [44:56<14:35:19, 24.89s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 121/2230 [45:20<14:27:50, 24.69s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 121/2230 [45:20<14:27:50, 24.69s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4779, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.54} +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.517, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.55} +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4728, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.55} +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4484, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.56} +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6016, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.56} +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:56:57,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4771, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.57} +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:58:37,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5307, 'learning_rate': 7.5e-05, 'epoch': 0.57} +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:18,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5465, 'learning_rate': 7.56e-05, 'epoch': 0.57} +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 17:59:37,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5136, 'learning_rate': 7.62e-05, 'epoch': 0.58} + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5539, 'learning_rate': 7.68e-05, 'epoch': 0.58} + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 129/2230 [48:29<13:36:26, 23.32s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:42,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:46,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:46,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.549, 'learning_rate': 7.74e-05, 'epoch': 0.59} +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:00:50,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:01,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3814, 'learning_rate': 7.8e-05, 'epoch': 0.59} +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:09,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:17,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 133/2230 [49:59<13:08:01, 22.55s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:33,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:33,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:37,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:48,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:01:48,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.492, 'learning_rate': 7.92e-05, 'epoch': 0.6} + 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▋ | 134/2230 [50:20<12:51:56, 22.10s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:02,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5401, 'learning_rate': 7.98e-05, 'epoch': 0.61} +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4804, 'learning_rate': 8.04e-05, 'epoch': 0.61} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:32,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:40,954 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:02:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5568, 'learning_rate': 8.1e-05, 'epoch': 0.61} +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:53,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:02:59,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:11,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4821, 'learning_rate': 8.16e-05, 'epoch': 0.62} +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:17,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:17,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:21,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:21,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:27,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:27,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:32,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:36,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:36,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:40,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:42,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:42,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:46,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:48,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:50,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:50,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:52,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:03:52,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:56,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:03:58,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:00,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:04,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:06,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:06,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:08,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:10,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:12,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:14,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:16,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:18,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:20,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:20,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:22,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:24,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:25,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:27,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:31,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:33,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:34,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:34,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:36,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:38,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:40,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:43,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:44,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:46,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:48,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:48,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:50,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:53,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:55,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:56,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:04:59,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:01,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:01,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:03,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:05,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:07,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:09,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:11,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:11,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:14,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:15,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:17,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:18,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:21,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:21,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:23,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:25,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:27,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:29,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:29,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:31,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:34,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:35,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:39,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:42,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:43,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:43,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4715, 'learning_rate': 8.879999999999999e-05, 'epoch': 0.67} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:47,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:47,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:51,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:51,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:54,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:58,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:05:58,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:01,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:01,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:05,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:05,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:09,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:12,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:12,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.287, 'learning_rate': 8.939999999999999e-05, 'epoch': 0.68} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:16,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:16,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:19,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:23,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:23,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:26,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:26,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:29,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:29,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:33,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:36,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:36,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:40,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:40,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.1091, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.68} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:43,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:47,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:47,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:50,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:50,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:54,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:57,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:06:57,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:04,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:07,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:11,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:11,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:14,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:17,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:17,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:21,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:21,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:24,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:28,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:28,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:31,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:34,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:38,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:38,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:41,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:07:41,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8993, 'learning_rate': 9.18e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8036, 'learning_rate': 9.24e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7089, 'learning_rate': 9.3e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:07:47,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.658, 'learning_rate': 9.36e-05, 'epoch': 0.71} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5872, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.71} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6761, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.72} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5023, 'learning_rate': 9.539999999999999e-05, 'epoch': 0.72} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5411, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.73} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|��████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5728, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.73} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5043, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.74} + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 158/2230 [57:50<14:56:07, 25.95s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5902, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.74} +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:12:21,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4932, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.74} + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5148, 'learning_rate': 9.9e-05, 'epoch': 0.75} + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5009, 'learning_rate': 9.96e-05, 'epoch': 0.75} + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3554, 'learning_rate': 0.0001002, 'epoch': 0.76} + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 166/2230 [1:01:16<14:31:47, 25.34s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:16,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:22,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:30,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:14:36,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:51,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4837, 'learning_rate': 0.0001014, 'epoch': 0.77} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3605, 'learning_rate': 0.000102, 'epoch': 0.77} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:14:57,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.423, 'learning_rate': 0.0001026, 'epoch': 0.78} + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3804, 'learning_rate': 0.00010319999999999999, 'epoch': 0.78} + 8%|█████▊ | 173/2230 [1:04:07<13:55:40, 24.38s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4725, 'learning_rate': 0.00010379999999999999, 'epoch': 0.78} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:16:09,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4212, 'learning_rate': 0.00010439999999999999, 'epoch': 0.79} + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4313, 'learning_rate': 0.00010499999999999999, 'epoch': 0.79} + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3636, 'learning_rate': 0.00010559999999999998, 'epoch': 0.8} + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4336, 'learning_rate': 0.00010619999999999998, 'epoch': 0.8} + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3958, 'learning_rate': 0.00010679999999999998, 'epoch': 0.81} + 8%|█████▉ | 176/2230 [1:05:19<13:42:47, 24.03s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:28,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.529, 'learning_rate': 0.00010739999999999998, 'epoch': 0.81} +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3276, 'learning_rate': 0.00010799999999999998, 'epoch': 0.82} +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:18:42,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:30,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:30,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5321, 'learning_rate': 0.00010859999999999998, 'epoch': 0.82} +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:34,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:34,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:38,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:38,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4487, 'learning_rate': 0.00010919999999999998, 'epoch': 0.83} +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:19:42,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:04,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 185/2230 [1:08:40<12:23:57, 21.83s/it]g-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:15,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:20:27,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 186/2230 [1:09:00<12:10:20, 21.44s/it] Setting `use_cache=False`...e computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:35,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:20:45,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 17:15:16,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.407, 'learning_rate': 0.00011099999999999999, 'epoch': 0.84} + 8%|██████▎ | 187/2230 [1:09:21<11:57:53, 21.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:00,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:09,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:09,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 188/2230 [1:09:41<11:50:45, 20.88s/it]g-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 188/2230 [1:09:41<11:50:45, 20.88s/it]g-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:16,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:16,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:24,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:30,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3983, 'learning_rate': 0.00011219999999999999, 'epoch': 0.85} +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:32,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:38,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:40,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:40,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:45,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:51,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:21:51,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4501, 'learning_rate': 0.00011279999999999999, 'epoch': 0.85} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:55,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:57,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:21:57,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:01,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:03,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:05,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:07,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:09,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:09,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:11,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:13,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:15,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:17,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:19,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:21,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:23,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:22:23,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3934, 'learning_rate': 0.00011399999999999999, 'epoch': 0.86} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:27,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:29,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:31,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:33,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:35,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:36,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▌ | 193/2230 [1:11:07<9:46:56, 17.29s/it] Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▌ | 193/2230 [1:11:07<9:46:56, 17.29s/it] Setting `use_cache=False`...e computed-26 18:20:54,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:42,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:44,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:45,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:47,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:49,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:51,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:53,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:53,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▌ | 194/2230 [1:11:22<9:19:45, 16.50s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:22:58,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:00,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:01,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:03,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:06,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:06,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:22:55,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▋ | 195/2230 [1:11:35<8:40:49, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:09,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:12,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:13,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:16,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:17,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:17,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:07,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:20,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:21,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:24,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:26,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:26,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▋ | 197/2230 [1:11:56<7:15:59, 12.87s/it] Setting `use_cache=False`...1] 2022-03-26 18:23:18,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:30,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:32,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:34,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:36,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:36,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:28,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:38,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:40,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:42,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:37,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 199/2230 [1:12:12<5:48:21, 10.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 199/2230 [1:12:12<5:48:21, 10.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:46,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:48,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:49,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:49,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:44,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 200/2230 [1:12:19<5:13:36, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 200/2230 [1:12:19<5:13:36, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:56,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:23:56,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:00,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:00,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:03,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:07,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:07,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:10,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:10,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:14,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:17,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:23:52,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 201/2230 [1:12:47<8:29:35, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 201/2230 [1:12:47<8:29:35, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:24,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:27,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:27,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:31,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:31,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:34,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:37,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:41,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:41,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:44,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:44,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:21,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 202/2230 [1:13:14<10:31:00, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 202/2230 [1:13:14<10:31:00, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:51,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:54,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:54,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:58,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:24:58,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:01,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:04,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:04,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:07,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:11,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:11,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:24:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 203/2230 [1:13:41<11:50:59, 21.05s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 203/2230 [1:13:41<11:50:59, 21.05s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0932, 'learning_rate': 0.00012059999999999999, 'epoch': 0.91} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:17,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:21,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:21,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:24,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:27,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:27,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:31,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:31,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:34,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:37,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:14,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 204/2230 [1:14:07<12:43:56, 22.62s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6482, 'learning_rate': 0.00012179999999999999, 'epoch': 0.92} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:25:44,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7276, 'learning_rate': 0.0001224, 'epoch': 0.92} + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6034, 'learning_rate': 0.00012299999999999998, 'epoch': 0.93} + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.58, 'learning_rate': 0.0001236, 'epoch': 0.93} + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4605, 'learning_rate': 0.00012419999999999998, 'epoch': 0.94} + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4878, 'learning_rate': 0.00012479999999999997, 'epoch': 0.94} + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.502, 'learning_rate': 0.00012539999999999999, 'epoch': 0.95} + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5168, 'learning_rate': 0.00012599999999999997, 'epoch': 0.95} + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 211/2230 [1:17:02<13:36:11, 24.26s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5251, 'learning_rate': 0.0001266, 'epoch': 0.96} + 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▏ | 213/2230 [1:17:49<13:23:51, 23.91s/it] Setting `use_cache=False`...1] 2022-03-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:29,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:29,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:34,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4259, 'learning_rate': 0.00012719999999999997, 'epoch': 0.96} +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:29:42,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3709, 'learning_rate': 0.0001278, 'epoch': 0.96} +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:02,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:11,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5284, 'learning_rate': 0.00012839999999999998, 'epoch': 0.97} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:23,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:35,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:43,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:50,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:56,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:30:56,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:59,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:30:59,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:31:04,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:25:40,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▎ | 218/2230 [1:19:33<11:39:47, 20.87s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▎ | 218/2230 [1:19:33<11:39:47, 20.87s/it][WARNING|modeling_bart.py:1051] 2022-03-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3609, 'learning_rate': 0.00012959999999999998, 'epoch': 0.98} +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:10,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:10,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:31:13,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:31:15,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:31:17,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:31:17,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:22,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:25,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:27,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:29,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:31,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:34,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:36,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:36,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:38,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:39,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:41,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:44,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:45,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:48,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:48,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:49,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:51,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:53,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:55,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:57,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:31:57,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:00,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:02,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:03,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:03,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:06,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:06,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:09,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:09,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:13,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:13,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:17,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:20,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:20,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:24,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:24,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:27,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:27,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:31,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:31,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:34,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:34,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:38,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:38,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:41,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:45,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:45,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:48,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:48,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:52,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:55,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:55,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7454, 'learning_rate': 0.0001338, 'epoch': 1.01} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6168, 'learning_rate': 0.0001344, 'epoch': 1.01} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1562, 'learning_rate': 0.000135, 'epoch': 1.02} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9135, 'learning_rate': 0.0001356, 'epoch': 1.02} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6708, 'learning_rate': 0.0001362, 'epoch': 1.03} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5292, 'learning_rate': 0.0001368, 'epoch': 1.03} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4944, 'learning_rate': 0.0001374, 'epoch': 1.04} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3521, 'learning_rate': 0.000138, 'epoch': 1.04} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4591, 'learning_rate': 0.0001386, 'epoch': 1.04} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3587, 'learning_rate': 0.0001392, 'epoch': 1.05} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3937, 'learning_rate': 0.00013979999999999998, 'epoch': 1.05} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3795, 'learning_rate': 0.0001404, 'epoch': 1.06} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2489, 'learning_rate': 0.00014099999999999998, 'epoch': 1.06} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3007, 'learning_rate': 0.00014159999999999997, 'epoch': 1.07} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2813, 'learning_rate': 0.0001422, 'epoch': 1.07} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3177, 'learning_rate': 0.00014279999999999997, 'epoch': 1.08} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0955, 'learning_rate': 0.0001434, 'epoch': 1.08} +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:32:59,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1885, 'learning_rate': 0.00014399999999999998, 'epoch': 1.09} +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1224, 'learning_rate': 0.0001446, 'epoch': 1.09} +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2549, 'learning_rate': 0.00014519999999999998, 'epoch': 1.09} +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1249, 'learning_rate': 0.0001458, 'epoch': 1.1} +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:40:15,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1269, 'learning_rate': 0.00014639999999999998, 'epoch': 1.1} + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1725, 'learning_rate': 0.000147, 'epoch': 1.11} + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 246/2230 [1:30:25<13:25:59, 24.37s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2151, 'learning_rate': 0.00014759999999999998, 'epoch': 1.11} + 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 248/2230 [1:31:12<13:09:18, 23.89s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1098, 'learning_rate': 0.0001482, 'epoch': 1.12} +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1609, 'learning_rate': 0.00014879999999999998, 'epoch': 1.12} +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:42:53,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0873, 'learning_rate': 0.0001494, 'epoch': 1.13} + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0377, 'learning_rate': 0.00015, 'epoch': 1.13} + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▍ | 251/2230 [1:32:22<12:55:26, 23.51s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:35,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:35,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1351, 'learning_rate': 0.00015059999999999997, 'epoch': 1.13} +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0665, 'learning_rate': 0.0001512, 'epoch': 1.14} +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:44:39,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:10,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:18,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0937, 'learning_rate': 0.00015179999999999998, 'epoch': 1.14} +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:26,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:45:46,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0957, 'learning_rate': 0.00015299999999999998, 'epoch': 1.15} +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:11,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:11,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:15,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1556, 'learning_rate': 0.0001536, 'epoch': 1.16} +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:21,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:46:33,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:39,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:49,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:46:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:08,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:08,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0407, 'learning_rate': 0.0001548, 'epoch': 1.17} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:12,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:12,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:16,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:24,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:27,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:31,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:31,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:34,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:34,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:38,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:38,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:42,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:49,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:47:49,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:53,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:59,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:47:59,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:03,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:03,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|████████▊ | 263/2230 [1:36:33<10:42:01, 19.58s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:07,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:09,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:11,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:13,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:15,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:18,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:20,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:22,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:48:22,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0776, 'learning_rate': 0.0001572, 'epoch': 1.18} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:25,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:27,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:29,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:31,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:33,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:35,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:35,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:37,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:39,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:41,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:45,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:50,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:52,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:52,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:54,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:56,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:57,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:48:59,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:02,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:04,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:04,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:05,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:07,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:10,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:12,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:13,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:17,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:20,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:22,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:24,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:25,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:29,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:29,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:30,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:32,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:34,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:36,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:38,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:38,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:40,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:42,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:44,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:46,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:46,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:48,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:51,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:52,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:52,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:54,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:57,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:49:59,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:03,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:03,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:06,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:06,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:10,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:13,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:17,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:21,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:21,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:24,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:24,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:28,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:28,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:31,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:35,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:35,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:38,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:41,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:41,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:45,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:45,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:48,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:52,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:52,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:50:55,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:00,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:00,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:03,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:03,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:07,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:10,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:10,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:13,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:17,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:17,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:20,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:24,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:27,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:27,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:30,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:30,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:34,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:37,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:37,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:40,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:40,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:44,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:47,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:51,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:54,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:54,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6105, 'learning_rate': 0.0001656, 'epoch': 1.25} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6108, 'learning_rate': 0.0001662, 'epoch': 1.25} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5733, 'learning_rate': 0.0001668, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4, 'learning_rate': 0.0001674, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4329, 'learning_rate': 0.000168, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 18:51:57,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3477, 'learning_rate': 0.0001686, 'epoch': 1.27} + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2902, 'learning_rate': 0.00016919999999999997, 'epoch': 1.27} + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 283/2230 [1:42:58<14:01:49, 25.94s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3448, 'learning_rate': 0.00016979999999999998, 'epoch': 1.28} +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 18:55:03,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2629, 'learning_rate': 0.00017099999999999998, 'epoch': 1.29} + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2435, 'learning_rate': 0.00017159999999999997, 'epoch': 1.29} + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▌ | 286/2230 [1:44:14<13:49:37, 25.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1274, 'learning_rate': 0.00017219999999999998, 'epoch': 1.3} + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▋ | 289/2230 [1:45:30<13:41:02, 25.38s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.28, 'learning_rate': 0.00017279999999999997, 'epoch': 1.3} + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 290/2230 [1:45:55<13:34:22, 25.19s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 291/2230 [1:46:19<13:27:59, 25.00s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1554, 'learning_rate': 0.00017399999999999997, 'epoch': 1.31} + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1921, 'learning_rate': 0.00017459999999999996, 'epoch': 1.31} + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1944, 'learning_rate': 0.00017519999999999998, 'epoch': 1.32} + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1347, 'learning_rate': 0.00017579999999999996, 'epoch': 1.32} + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1154, 'learning_rate': 0.00017639999999999998, 'epoch': 1.33} + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 292/2230 [1:46:44<13:22:54, 24.86s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.122, 'learning_rate': 0.00017759999999999998, 'epoch': 1.34} + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1743, 'learning_rate': 0.00017819999999999997, 'epoch': 1.34} + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|██��██████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 297/2230 [1:48:45<12:57:47, 24.14s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1579, 'learning_rate': 0.00017879999999999998, 'epoch': 1.35} +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:01:25,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1053, 'learning_rate': 0.00017939999999999997, 'epoch': 1.35} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0905, 'learning_rate': 0.00017999999999999998, 'epoch': 1.35} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1271, 'learning_rate': 0.00018059999999999997, 'epoch': 1.36} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0824, 'learning_rate': 0.00018119999999999999, 'epoch': 1.36} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0832, 'learning_rate': 0.00018179999999999997, 'epoch': 1.37} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:01:40,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:40,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:40,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0989, 'learning_rate': 0.0001824, 'epoch': 1.37} +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:44,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:03:50,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.137, 'learning_rate': 0.00018299999999999998, 'epoch': 1.38} +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:08,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1543, 'learning_rate': 0.0001836, 'epoch': 1.38} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:04:21,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:29,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:43,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:47,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:53,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:59,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:04:59,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0728, 'learning_rate': 0.0001848, 'epoch': 1.39} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:10,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:16,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:16,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:19,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:22,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9376, 'learning_rate': 0.00018539999999999998, 'epoch': 1.39} +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:28,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:28,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:32,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:05:32,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:36,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:40,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8833, 'learning_rate': 0.000186, 'epoch': 1.4} +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:46,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:48,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:56,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:05:58,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:06:00,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:06:00,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0698, 'learning_rate': 0.00018659999999999998, 'epoch': 1.4} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:04,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:07,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:09,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:11,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:13,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:15,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:17,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:17,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:19,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:21,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:23,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:25,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:27,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:29,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:31,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:31,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:33,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:35,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:36,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:38,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:40,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:42,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:45,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:49,425 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:51,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:52,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:56,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:57,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:06:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:02,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:04,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:05,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:07,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:11,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:11,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:14,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:18,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:19,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:20,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:24,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:24,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:25,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:30,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:31,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:33,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:36,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:38,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:40,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:44,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:46,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:48,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:48,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:50,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:52,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:54,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:54,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:55,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:55,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:07:59,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:02,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:02,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:06,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:06,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:09,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:13,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:13,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:17,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:17,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:20,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:24,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:24,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:27,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:31,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:31,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:34,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:38,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:38,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:41,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:41,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:44,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:44,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:48,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:51,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3192, 'learning_rate': 0.0001938, 'epoch': 1.46} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:56,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:59,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:08:59,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:03,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:03,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:06,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:09,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:09,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:13,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:16,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:16,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0983, 'learning_rate': 0.00019439999999999998, 'epoch': 1.46} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:23,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:23,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:26,744 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:36,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:36,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:40,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:40,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:43,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:46,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:46,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7318, 'learning_rate': 0.000195, 'epoch': 1.47} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6187, 'learning_rate': 0.00019559999999999998, 'epoch': 1.47} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:09:50,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4429, 'learning_rate': 0.0001962, 'epoch': 1.48} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3152, 'learning_rate': 0.00019679999999999999, 'epoch': 1.48} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4061, 'learning_rate': 0.0001974, 'epoch': 1.48} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3511, 'learning_rate': 0.000198, 'epoch': 1.49} + 15%|████��██████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3352, 'learning_rate': 0.0001986, 'epoch': 1.49} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2468, 'learning_rate': 0.0001992, 'epoch': 1.5} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2837, 'learning_rate': 0.0001998, 'epoch': 1.5} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1831, 'learning_rate': 0.0002004, 'epoch': 1.51} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2178, 'learning_rate': 0.000201, 'epoch': 1.51} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1241, 'learning_rate': 0.0002016, 'epoch': 1.52} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1374, 'learning_rate': 0.0002022, 'epoch': 1.52} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0581, 'learning_rate': 0.0002028, 'epoch': 1.52} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1244, 'learning_rate': 0.00020339999999999998, 'epoch': 1.53} + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 329/2230 [1:59:08<13:08:53, 24.90s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.101, 'learning_rate': 0.000204, 'epoch': 1.53} + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 342/2230 [2:04:40<13:01:31, 24.84s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0908, 'learning_rate': 0.00020459999999999999, 'epoch': 1.54} + 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 343/2230 [2:05:04<12:56:13, 24.68s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9862, 'learning_rate': 0.0002052, 'epoch': 1.54} +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:16:47,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1008, 'learning_rate': 0.0002058, 'epoch': 1.55} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0601, 'learning_rate': 0.00020639999999999998, 'epoch': 1.55} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.063, 'learning_rate': 0.00020699999999999996, 'epoch': 1.56} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0309, 'learning_rate': 0.00020759999999999998, 'epoch': 1.56} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0944, 'learning_rate': 0.00020819999999999996, 'epoch': 1.57} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0536, 'learning_rate': 0.00020879999999999998, 'epoch': 1.57} + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 345/2230 [2:05:54<12:53:10, 24.61s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0323, 'learning_rate': 0.00020939999999999997, 'epoch': 1.57} +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:19:47,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:20:00,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.041, 'learning_rate': 0.00020999999999999998, 'epoch': 1.58} +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:10,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:18,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:18,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0926, 'learning_rate': 0.00021059999999999997, 'epoch': 1.58} + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 353/2230 [2:09:00<12:00:49, 23.04s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:45,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.053, 'learning_rate': 0.00021119999999999996, 'epoch': 1.59} +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:20:53,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:01,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:01,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9695, 'learning_rate': 0.00021179999999999997, 'epoch': 1.59} +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:06,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:21:40,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:21:52,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.978, 'learning_rate': 0.00021299999999999997, 'epoch': 1.6} + 16%|████████████ | 357/2230 [2:10:28<11:33:25, 22.21s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:07,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:17,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:17,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9869, 'learning_rate': 0.00021359999999999996, 'epoch': 1.61} + 16%|████████████ | 358/2230 [2:10:49<11:19:29, 21.78s/it]g-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:22:27,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:36,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.983, 'learning_rate': 0.00021419999999999998, 'epoch': 1.61} + 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 359/2230 [2:11:09<11:06:21, 21.37s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:50,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:22:56,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 360/2230 [2:11:29<10:53:04, 20.95s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████ | 360/2230 [2:11:29<10:53:04, 20.95s/it] Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:04,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:04,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:08,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:08,771 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:12,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:18,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0257, 'learning_rate': 0.00021539999999999998, 'epoch': 1.62} +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:24,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:24,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:29,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:29,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:33,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:33,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:37,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:39,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:39,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9414, 'learning_rate': 0.00021599999999999996, 'epoch': 1.62} +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:43,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:23:45,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:52,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:57,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:23:57,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 18:31:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▏ | 363/2230 [2:12:27<10:12:12, 19.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▏ | 363/2230 [2:12:27<10:12:12, 19.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:03,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:05,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:07,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:09,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:11,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:24:14,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:17,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:19,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:21,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:23,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:25,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:27,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:29,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:31,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:31,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:33,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:35,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:37,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:39,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:42,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:44,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:46,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:46,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:47,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:49,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:51,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:54,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:56,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:24:59,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:02,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:04,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:05,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:07,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:10,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:10,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:14,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:15,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:18,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:22,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:22,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:24,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:26,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:27,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:30,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:32,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:32,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:34,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:36,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:38,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:40,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:40,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:42,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:45,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:46,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:46,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:48,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:50,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:53,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:56,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:25:56,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:00,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:00,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:04,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:07,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:07,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:11,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:11,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:14,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:14,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:18,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:21,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:25,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:25,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:28,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:32,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:32,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:35,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:35,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:38,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:38,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:42,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:45,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:45,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:49,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:53,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:53,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:26:57,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:00,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:03,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:03,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:07,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:10,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:10,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:14,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:17,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:17,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6441, 'learning_rate': 0.00022439999999999998, 'epoch': 1.69} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:20,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:20,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:24,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:27,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:27,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:34,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:37,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:37,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:40,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:44,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:47,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:47,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3542, 'learning_rate': 0.00022559999999999998, 'epoch': 1.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.231, 'learning_rate': 0.00022619999999999997, 'epoch': 1.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2993, 'learning_rate': 0.00022679999999999998, 'epoch': 1.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:27:50,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2949, 'learning_rate': 0.00022739999999999997, 'epoch': 1.71} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2101, 'learning_rate': 0.00022799999999999999, 'epoch': 1.71} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3218, 'learning_rate': 0.00022859999999999997, 'epoch': 1.72} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1069, 'learning_rate': 0.0002292, 'epoch': 1.72} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0308, 'learning_rate': 0.00022979999999999997, 'epoch': 1.73} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0998, 'learning_rate': 0.0002304, 'epoch': 1.73} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0487, 'learning_rate': 0.00023099999999999998, 'epoch': 1.74} +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:29:14,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0931, 'learning_rate': 0.0002316, 'epoch': 1.74} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0471, 'learning_rate': 0.00023219999999999998, 'epoch': 1.74} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0103, 'learning_rate': 0.0002328, 'epoch': 1.75} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0283, 'learning_rate': 0.00023339999999999998, 'epoch': 1.75} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0728, 'learning_rate': 0.000234, 'epoch': 1.76} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0685, 'learning_rate': 0.00023459999999999998, 'epoch': 1.76} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0599, 'learning_rate': 0.0002352, 'epoch': 1.77} + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 388/2230 [2:20:59<13:06:03, 25.60s/it]g-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0421, 'learning_rate': 0.00023579999999999999, 'epoch': 1.77} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:35:21,300 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▎ | 396/2230 [2:24:15<12:25:51, 24.40s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▎ | 396/2230 [2:24:15<12:25:51, 24.40s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9362, 'learning_rate': 0.0002364, 'epoch': 1.78} +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.08, 'learning_rate': 0.000237, 'epoch': 1.78} +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0144, 'learning_rate': 0.0002376, 'epoch': 1.78} +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:35:51,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0673, 'learning_rate': 0.0002382, 'epoch': 1.79} +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0356, 'learning_rate': 0.0002388, 'epoch': 1.79} +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9756, 'learning_rate': 0.0002394, 'epoch': 1.8} +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9296, 'learning_rate': 0.00023999999999999998, 'epoch': 1.8} +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:36:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9861, 'learning_rate': 0.0002406, 'epoch': 1.81} +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:29,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:35,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0462, 'learning_rate': 0.00024119999999999998, 'epoch': 1.81} +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:38:51,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.82, 'learning_rate': 0.0002418, 'epoch': 1.82} +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:04,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:27,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9398, 'learning_rate': 0.00024239999999999998, 'epoch': 1.82} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:39:39,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:47,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:47,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9655, 'learning_rate': 0.000243, 'epoch': 1.83} +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:39:51,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:05,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:12,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:16,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8895, 'learning_rate': 0.00024359999999999999, 'epoch': 1.83} +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:23,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:23,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:28,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0266, 'learning_rate': 0.00024419999999999997, 'epoch': 1.83} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:38,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:46,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:40:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:40:58,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:02,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:08,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:08,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:13,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:13,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9576, 'learning_rate': 0.00024539999999999995, 'epoch': 1.84} +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:17,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:25,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:25,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:29,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:31,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:31,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:35,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:35,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 412/2230 [2:30:05<10:07:30, 20.05s/it] Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:39,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:42,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:42,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:50,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:53,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:56,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:41:56,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9656, 'learning_rate': 0.0002466, 'epoch': 1.85} +[WARNING|modeling_utils.py:388] 2022-03-26 19:41:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:42:02,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:42:04,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:42:06,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:42:06,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:10,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:12,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:12,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:23:59,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 414/2230 [2:30:41<9:34:30, 18.98s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:16,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:18,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:20,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:22,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:24,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:26,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:27,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:27,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:14,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 415/2230 [2:30:57<9:03:06, 17.95s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:31,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:33,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:35,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:37,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:40,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:42,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:29,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 416/2230 [2:31:11<8:33:00, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 416/2230 [2:31:11<8:33:00, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:46,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:47,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:49,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:53,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:54,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:56,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:42:56,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:44,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 417/2230 [2:31:25<8:00:47, 15.91s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:01,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:02,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:04,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:07,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:08,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:08,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:42:57,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:11,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:13,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:14,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:17,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:19,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:10,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:23,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:24,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:26,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:29,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:21,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 420/2230 [2:31:59<6:20:25, 12.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 420/2230 [2:31:59<6:20:25, 12.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:33,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:34,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:36,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 421/2230 [2:32:07<5:41:46, 11.34s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 421/2230 [2:32:07<5:41:46, 11.34s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:31,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:41,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:43,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:45,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:39,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 422/2230 [2:32:14<5:04:23, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 422/2230 [2:32:14<5:04:23, 10.10s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:49,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:51,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it] Setting `use_cache=False`...1] 2022-03-26 19:43:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 423/2230 [2:32:20<4:27:26, 8.88s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:57,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:43:57,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:00,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:00,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:04,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:04,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:07,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:11,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:11,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:14,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:18,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:43:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 424/2230 [2:32:48<7:19:36, 14.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 424/2230 [2:32:48<7:19:36, 14.61s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:25,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:28,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:28,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:31,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:31,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:35,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:35,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:38,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:42,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:42,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:45,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it] Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it] Setting `use_cache=False`...1] 2022-03-26 19:44:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 425/2230 [2:33:16<9:21:36, 18.67s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:53,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:56,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:44:56,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:00,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:03,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:03,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:10,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:10,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:13,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:13,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:44:49,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 426/2230 [2:33:43<10:36:06, 21.16s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 426/2230 [2:33:43<10:36:06, 21.16s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:20,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:23,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:23,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:26,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:29,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:29,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:33,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:36,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:36,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:39,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▎ | 427/2230 [2:34:09<11:20:58, 22.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:46,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4286, 'learning_rate': 0.0002556, 'epoch': 1.92} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3649, 'learning_rate': 0.0002562, 'epoch': 1.92} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.143, 'learning_rate': 0.00025679999999999995, 'epoch': 1.93} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1339, 'learning_rate': 0.00025739999999999997, 'epoch': 1.93} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:45:49,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9971, 'learning_rate': 0.0002586, 'epoch': 1.94} + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0285, 'learning_rate': 0.00025919999999999996, 'epoch': 1.95} + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 432/2230 [2:36:16<12:23:20, 24.81s/it] Setting `use_cache=False`...1] 2022-03-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:02,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:25,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9439, 'learning_rate': 0.000261, 'epoch': 1.96} +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:49:48,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:04,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:04,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:45:42,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.91, 'learning_rate': 0.00026159999999999996, 'epoch': 1.96} + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▋ | 438/2230 [2:38:36<11:32:21, 23.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:20,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:29,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:29,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.819, 'learning_rate': 0.0002622, 'epoch': 1.97} +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:39,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:50:47,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8625, 'learning_rate': 0.0002628, 'epoch': 1.97} + 20%|██████████████▊ | 440/2230 [2:39:17<10:52:42, 21.88s/it] Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:55,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:50:58,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:04,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:04,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:08,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:12,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:14,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:16,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:18,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:51:18,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:22,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:24,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:24,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:50:09,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████ | 442/2230 [2:39:54<9:54:25, 19.95s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:28,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:30,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:32,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:34,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:36,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:37,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:26,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████ | 443/2230 [2:40:08<9:08:05, 18.40s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████ | 443/2230 [2:40:08<9:08:05, 18.40s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:43,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:44,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:46,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:49,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:51,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 444/2230 [2:40:21<8:17:32, 16.71s/it] Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 444/2230 [2:40:21<8:17:32, 16.71s/it] Setting `use_cache=False`...1] 2022-03-26 19:51:41,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:55,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:56,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:51:58,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:01,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:51:54,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 445/2230 [2:40:30<7:10:17, 14.46s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:05,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:06,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:09,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:09,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:03,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 446/2230 [2:40:37<6:00:53, 12.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 446/2230 [2:40:37<6:00:53, 12.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:14,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:14,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:18,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:21,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:21,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:25,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:25,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:28,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:28,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:32,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:35,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|█████████████���█▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it] Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it] Setting `use_cache=False`...1] 2022-03-26 19:52:10,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 447/2230 [2:41:06<8:28:44, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:42,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:42,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:46,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:53,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:56,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:52:56,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3123, 'learning_rate': 0.0002676, 'epoch': 2.01} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9094, 'learning_rate': 0.00026819999999999996, 'epoch': 2.01} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4246, 'learning_rate': 0.0002688, 'epoch': 2.02} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:53:00,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3338, 'learning_rate': 0.0002694, 'epoch': 2.02} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0653, 'learning_rate': 0.00027, 'epoch': 2.03} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9293, 'learning_rate': 0.00027059999999999996, 'epoch': 2.03} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.924, 'learning_rate': 0.0002712, 'epoch': 2.04} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8679, 'learning_rate': 0.0002718, 'epoch': 2.04} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8316, 'learning_rate': 0.0002724, 'epoch': 2.04} +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 19:54:12,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6439, 'learning_rate': 0.00027299999999999997, 'epoch': 2.05} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6644, 'learning_rate': 0.0002736, 'epoch': 2.05} +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 19:57:08,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4917, 'learning_rate': 0.0002742, 'epoch': 2.06} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5827, 'learning_rate': 0.0002748, 'epoch': 2.06} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5428, 'learning_rate': 0.00027539999999999997, 'epoch': 2.07} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5134, 'learning_rate': 0.000276, 'epoch': 2.07} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5094, 'learning_rate': 0.0002766, 'epoch': 2.08} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4225, 'learning_rate': 0.0002772, 'epoch': 2.08} + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▍ | 459/2230 [2:46:23<12:37:49, 25.67s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4751, 'learning_rate': 0.0002778, 'epoch': 2.09} + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████���███████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 465/2230 [2:48:52<12:13:22, 24.93s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3967, 'learning_rate': 0.0002784, 'epoch': 2.09} + 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 466/2230 [2:49:17<12:08:02, 24.76s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3292, 'learning_rate': 0.000279, 'epoch': 2.09} +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:00,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3373, 'learning_rate': 0.00027959999999999997, 'epoch': 2.1} +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:01:33,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3579, 'learning_rate': 0.0002802, 'epoch': 2.1} +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:01:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3337, 'learning_rate': 0.0002808, 'epoch': 2.11} + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3649, 'learning_rate': 0.00028139999999999996, 'epoch': 2.11} + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 470/2230 [2:50:54<11:52:01, 24.27s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2899, 'learning_rate': 0.00028199999999999997, 'epoch': 2.12} + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2992, 'learning_rate': 0.0002826, 'epoch': 2.12} + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 472/2230 [2:51:40<11:35:36, 23.74s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.197, 'learning_rate': 0.00028319999999999994, 'epoch': 2.13} +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:03:49,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2116, 'learning_rate': 0.00028379999999999996, 'epoch': 2.13} +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:16,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.251, 'learning_rate': 0.0002844, 'epoch': 2.13} +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:49,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:49,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:04:53,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:02,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:02,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:06,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:06,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2578, 'learning_rate': 0.000285, 'epoch': 2.14} +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2191, 'learning_rate': 0.00028559999999999995, 'epoch': 2.14} +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:10,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1944, 'learning_rate': 0.00028619999999999996, 'epoch': 2.15} +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:38,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:05:57,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2094, 'learning_rate': 0.0002868, 'epoch': 2.15} +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:07,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:17,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1629, 'learning_rate': 0.00028739999999999994, 'epoch': 2.16} +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:27,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:38,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:50,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:06:50,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.114, 'learning_rate': 0.00028799999999999995, 'epoch': 2.16} +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:06:54,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:02,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:08,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1973, 'learning_rate': 0.00028859999999999997, 'epoch': 2.17} +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:11,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:17,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:23,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:25,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:25,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:29,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:29,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.0682, 'learning_rate': 0.0002892, 'epoch': 2.17} +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:33,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:33,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:37,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:37,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:41,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:43,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:43,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:47,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:47,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:50,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:07:50,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:53,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:56,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:58,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:07:58,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:08:02,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 20:08:04,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 486/2230 [2:56:34<9:13:51, 19.05s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 486/2230 [2:56:34<9:13:51, 19.05s/it] Setting `use_cache=False`...e computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:08,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:10,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:12,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:14,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:16,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:18,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:20,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:22,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:22,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:24,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:26,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:28,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:34,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:36,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:38,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:38,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:40,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:41,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:43,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:45,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:47,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:50,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:52,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:54,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:54,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:56,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:08:57,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:01,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:02,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:04,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:09,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:10,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:12,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:15,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:16,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:19,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:19,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:20,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:23,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:25,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:26,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:28,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:28,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:31,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:32,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:35,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:37,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:38,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:38,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:40,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:44,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:47,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:51,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:53,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:55,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:55,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:57,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:09:59,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:01,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:01,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:03,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:03,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:07,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:07,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:10,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:10,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:14,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:17,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:17,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:21,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:21,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:24,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:28,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:28,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.3452, 'learning_rate': 0.00029699999999999996, 'epoch': 2.23} +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:31,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:31,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:35,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:35,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:38,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:42,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:42,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:45,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:45,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:49,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:52,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:52,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:56,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.654, 'learning_rate': 0.00029759999999999997, 'epoch': 2.23} +[WARNING|modeling_utils.py:388] 2022-03-26 20:10:59,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:03,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:03,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:06,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:06,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:09,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:13,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:13,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:20,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:23,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:26,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:26,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:30,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:33,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:33,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:36,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:36,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:40,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:43,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:43,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:46,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:50,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 20:11:50,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%| | 0/331 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 0%| | 0/331 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 2/331 [00:01<03:27, 1.59it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/26/2022 20:21:39 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 3.9967808723449707, 'eval_wer': 1.6151527171757238, 'eval_runtime': 586.6262, 'eval_samples_per_second': 4.504, 'eval_steps_per_second': 0.564, 'epoch': 2.24} + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 4/331 [00:03<05:11, 1.05it/s]g-point operations will not be computed-26 19:52:39,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/26/2022 20:22:58 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220326_171130-bdf5nvyg/run-bdf5nvyg.wandb']. This may take a bit of time if the files are large.