0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:47:56,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:47:57,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:47:58,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:47:59,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:00,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:01,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:01,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:03,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:03,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:05,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:05,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:06,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:07,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:08,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:09,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:10,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:11,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:12,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:13,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:14,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:15,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:16,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:16,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:18,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:18,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:20,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:20,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:21,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:22,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:23,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.038, 'learning_rate': 6.000000000000001e-08, 'epoch': 0.0} [WARNING|modeling_utils.py:388] 2022-03-27 19:48:24,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/2230 [00:30<19:02:57, 30.77s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:48:25,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:26,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:27,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:28,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:29,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:29,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:31,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:31,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:32,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:33,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:34,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:35,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:36,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:37,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:38,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:39,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:40,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:40,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:41,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:42,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:43,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:44,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:45,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:46,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:47,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:47,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:49,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:49,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:50,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:51,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0401, 'learning_rate': 1.2000000000000002e-07, 'epoch': 0.01} [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:52,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:53,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 2/2230 [00:59<18:22:52, 29.70s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:48:54,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:55,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:56,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:57,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:58,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:48:58,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:48:59,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:00,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:01,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:02,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:03,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:04,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:05,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:06,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:07,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:07,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:08,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:09,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:10,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:11,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:12,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:13,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:14,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:14,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:16,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:16,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:17,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:18,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:19,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:20,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0331, 'learning_rate': 1.8e-07, 'epoch': 0.01} [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:21,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:22,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 3/2230 [01:28<18:05:11, 29.24s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:49:23,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:23,852 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:24,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:26,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:27,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:28,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:29,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:30,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:30,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:31,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:32,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:33,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:34,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:35,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:36,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:37,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:37,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:38,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:39,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:40,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:41,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:42,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:44,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:44,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:46,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:46,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:47,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:48,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:49,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0293, 'learning_rate': 2.4000000000000003e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-27 19:49:50,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 4/2230 [01:56<17:48:47, 28.81s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:49:51,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:52,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:53,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:53,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:54,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:55,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:56,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:57,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:49:58,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:49:58,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:00,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:00,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:01,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:02,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:03,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:04,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:05,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:05,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:07,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:07,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:08,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:09,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:10,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:11,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:12,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:12,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:13,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:14,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:15,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:16,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:17,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:18,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 5/2230 [02:24<17:36:13, 28.48s/it] 0%|▏ | 5/2230 [02:24<17:36:13, 28.48s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:50:19,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:19,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:21,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:21,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:22,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:23,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:24,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:25,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:26,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:26,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:28,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:28,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:29,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:30,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:31,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:32,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:33,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:34,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:35,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:37,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:38,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:38,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:40,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:40,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:41,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:42,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:43,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:44,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:45,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:45,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 6/2230 [02:52<17:25:46, 28.21s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:50:46,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0413, 'learning_rate': 3.6e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-03-27 19:50:47,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:48,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:49,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:50,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:51,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:52,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:53,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:54,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:55,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:56,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:57,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:58,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:50:58,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:50:59,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:00,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:01,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:02,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:03,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:03,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:04,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:05,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:06,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:07,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:08,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:08,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:09,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:10,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:11,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:12,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0273, 'learning_rate': 4.2e-07, 'epoch': 0.03} [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:13,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:13,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 7/2230 [03:20<17:23:38, 28.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:51:15,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:15,733 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:16,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:17,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:18,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:19,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:20,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:20,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:21,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:22,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:23,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:24,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:25,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:25,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:27,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:27,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:28,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:30,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:31,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:32,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:32,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:33,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:34,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:35,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:36,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:37,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:37,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:38,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:39,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:40,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0341, 'learning_rate': 4.800000000000001e-07, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-03-27 19:51:41,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 8/2230 [03:47<17:14:51, 27.94s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:51:42,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:43,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:44,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:44,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:45,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:46,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:47,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:48,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:49,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:49,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:50,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:51,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:52,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:53,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:54,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:55,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:56,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:56,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:57,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:51:58,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:51:59,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:00,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:01,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:01,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:02,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:03,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:04,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:05,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:06,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:06,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0409, 'learning_rate': 5.4e-07, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:07,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:08,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 9/2230 [04:14<17:05:34, 27.71s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:52:09,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:10,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:11,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:11,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:13,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:13,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:14,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:15,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:16,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:17,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:18,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:18,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:19,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:20,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:21,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:22,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:23,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:23,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:24,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:25,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:26,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:27,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:28,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:28,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:29,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:30,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:31,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:32,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:33,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:33,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:34,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:35,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 10/2230 [04:41<16:56:55, 27.48s/it] 0%|▎ | 10/2230 [04:41<16:56:55, 27.48s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:52:36,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:37,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:38,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:38,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:39,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:40,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:41,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:42,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:43,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:43,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:44,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:45,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:47,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:48,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:48,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:49,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:50,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:51,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:52,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:53,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:53,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:54,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:55,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:56,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:57,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:58,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:52:58,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:52:59,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:00,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:01,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:02,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▍ | 11/2230 [05:08<16:46:44, 27.22s/it] 0%|▍ | 11/2230 [05:08<16:46:44, 27.22s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:53:03,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:03,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:04,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:05,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:06,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:07,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:08,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:08,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:09,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:10,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:11,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:12,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:13,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:13,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:14,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:15,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:16,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:17,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:18,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:18,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:19,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:20,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:21,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:22,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:23,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:23,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:24,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:25,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:26,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:26,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:28,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:28,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 12/2230 [05:35<16:38:37, 27.01s/it] 1%|▍ | 12/2230 [05:35<16:38:37, 27.01s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:53:29,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:30,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:31,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:32,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:33,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:33,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:34,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:35,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:36,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:37,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:38,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:38,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:39,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:41,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:42,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:43,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:44,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:45,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:46,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:47,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:47,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:48,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:49,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:50,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:51,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:52,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:52,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:53,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:54,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:55,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:56,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0364, 'learning_rate': 7.799999999999999e-07, 'epoch': 0.06} 1%|▍ | 13/2230 [06:02<16:42:48, 27.14s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:53:57,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:53:58,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:53:59,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:00,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:01,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:02,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:02,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:03,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:04,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:05,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:06,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:07,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:07,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:08,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:09,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:10,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:10,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:12,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:12,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:13,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:14,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:15,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:15,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:16,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:17,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:18,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:19,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:20,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:20,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:21,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:22,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 14/2230 [06:28<16:32:24, 26.87s/it] 1%|▍ | 14/2230 [06:28<16:32:24, 26.87s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:54:23,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:24,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:25,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:25,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:26,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:27,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:28,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:28,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:29,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:30,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:31,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:32,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:33,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:33,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:34,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:35,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:36,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:37,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:38,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:38,697 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:39,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:40,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:41,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:41,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:42,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:43,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:44,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:45,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:46,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:46,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:47,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:48,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 15/2230 [06:54<16:22:18, 26.61s/it] 1%|▌ | 15/2230 [06:54<16:22:18, 26.61s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:54:49,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:50,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:51,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:51,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:52,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:53,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:54,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:54,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:55,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:56,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:57,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:58,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:54:59,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:54:59,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:00,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:01,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:02,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:02,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:03,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:04,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:05,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:06,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:07,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:07,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:08,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:09,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:10,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:10,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:11,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:12,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:13,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:14,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 0.0239, 'learning_rate': 9.600000000000001e-07, 'epoch': 0.07} 1%|▌ | 16/2230 [07:20<16:12:24, 26.35s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:55:15,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:15,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:16,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:17,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:18,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:19,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:20,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:20,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:21,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:22,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:23,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:23,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:24,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:25,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:26,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:28,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:28,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:29,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:30,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:31,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:31,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:32,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:33,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:34,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:35,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:36,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:36,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:37,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:38,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:39,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:39,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 17/2230 [07:46<16:05:08, 26.17s/it] 1%|▌ | 17/2230 [07:46<16:05:08, 26.17s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:55:40,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:41,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:42,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:43,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:44,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:44,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:45,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:46,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:47,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:47,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:48,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:49,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:50,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:50,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:52,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:52,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:53,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:54,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:55,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:55,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:56,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:57,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:58,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:55:58,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:55:59,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:00,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:01,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:02,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:03,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:03,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:04,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:05,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 18/2230 [08:11<15:57:02, 25.96s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:56:06,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0328, 'learning_rate': 1.08e-06, 'epoch': 0.08} [WARNING|modeling_utils.py:388] 2022-03-27 19:56:06,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:07,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:08,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:09,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:10,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:11,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:11,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:12,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:13,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:14,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:14,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:15,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:16,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:17,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:18,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:19,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:19,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:20,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:21,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:22,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:22,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:23,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:25,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:26,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:26,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:27,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:28,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:29,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:29,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:30,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:31,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 19/2230 [08:37<16:00:01, 26.05s/it] 1%|▋ | 19/2230 [08:37<16:00:01, 26.05s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 19:56:33,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:35,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:35,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:38,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:42,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:42,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:45,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:45,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:48,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:51,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:51,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:54,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:56:54,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:32,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 20/2230 [09:03<15:50:12, 25.80s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 20/2230 [09:03<15:50:12, 25.80s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:00,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:04,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:04,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:07,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:10,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:13,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:16,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:16,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:19,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:56:57,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 21/2230 [09:28<15:42:12, 25.59s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 21/2230 [09:28<15:42:12, 25.59s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0308, 'learning_rate': 1.26e-06, 'epoch': 0.09} [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:26,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:29,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:29,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:32,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:38,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:41,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:41,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:44,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:22,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 22/2230 [09:52<15:32:18, 25.33s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 22/2230 [09:52<15:32:18, 25.33s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0488, 'learning_rate': 1.3199999999999999e-06, 'epoch': 0.1} [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:50,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:53,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:53,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:57:56,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:00,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:00,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:03,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:06,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:06,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:09,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:09,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:57:47,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:17<15:24:26, 25.13s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 23/2230 [10:17<15:24:26, 25.13s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:15,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:18,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:18,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:21,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:27,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:30,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:30,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:33,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:33,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:12,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 24/2230 [10:42<15:15:33, 24.90s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:42,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:45,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:45,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:48,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:51,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:51,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:54,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:57,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:57,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 19:58:57,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-27 19:58:36,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.027, 'learning_rate': 1.5599999999999999e-06, 'epoch': 0.12} 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 25/2230 [11:07<15:16:57, 24.95s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0333, 'learning_rate': 1.62e-06, 'epoch': 0.12} 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0247, 'learning_rate': 1.68e-06, 'epoch': 0.13} 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [11:55<14:59:36, 24.50s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0298, 'learning_rate': 1.74e-06, 'epoch': 0.13} 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0264, 'learning_rate': 1.8e-06, 'epoch': 0.13} 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0255, 'learning_rate': 1.86e-06, 'epoch': 0.14} 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [12:42<14:41:45, 24.04s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0322, 'learning_rate': 1.9200000000000003e-06, 'epoch': 0.14} 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 32/2230 [13:52<14:22:20, 23.54s/it][WARNING|modeling_bart.py:1051] 2022-03-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:01:58,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:01:58,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:01:58,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:02:05,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:02:05,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:02:05,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0279, 'learning_rate': 1.98e-06, 'epoch': 0.15} [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.021, 'learning_rate': 2.0400000000000004e-06, 'epoch': 0.15} [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:27,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:39,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:39,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:43,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:43,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:43,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:43,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:43,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 35/2230 [14:57<13:35:39, 22.30s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 35/2230 [14:57<13:35:39, 22.30s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 35/2230 [14:57<13:35:39, 22.30s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0335, 'learning_rate': 2.16e-06, 'epoch': 0.16} [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:02:57,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:20,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:20,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:20,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:20,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:28,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:28,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:28,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:28,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:03:34,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:42,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:42,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:42,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:42,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:50,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:50,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:50,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:50,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:57,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:03:57,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:00,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:03,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:03,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:07,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:07,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:07,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:13,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:13,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0277, 'learning_rate': 2.34e-06, 'epoch': 0.17} [WARNING|modeling_utils.py:388] 2022-03-27 20:04:13,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:19,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:21,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:21,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:25,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:25,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:29,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:31,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:31,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0281, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.18} [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:35,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:38,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:04:38,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:41,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:43,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:46,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:48,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:50,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:50,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:52,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:54,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:56,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:04:58,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:00,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:02,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:04,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:04,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:06,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:08,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:10,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:12,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:14,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:15,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:17,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:19,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:19,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:21,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:23,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:26,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:28,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:30,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:32,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:34,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:34,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:35,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:39,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:40,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:42,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:45,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:46,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:46,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:48,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:51,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:52,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:55,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:57,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:57,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:05:59,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:01,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:02,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:05,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:07,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:07,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:08,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:10,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:13,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:15,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:15,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:17,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:19,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:20,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:23,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:23,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:25,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:27,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:29,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:29,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:30,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:33,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:33,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:37,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:37,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:40,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:40,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:44,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:48,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:48,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:51,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:51,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:55,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:55,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:59,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:06:59,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0455, 'learning_rate': 3.06e-06, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-03-27 20:07:02,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:02,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:06,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:09,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:09,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:13,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:13,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:16,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:20,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:20,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:24,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:24,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:27,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:27,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0386, 'learning_rate': 3.1199999999999998e-06, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-03-27 20:07:31,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:34,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:34,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:38,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:38,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:41,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:41,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:45,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:48,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:48,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:55,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:55,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:59,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:07:59,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:02,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:02,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:06,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:09,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:09,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0309, 'learning_rate': 3.24e-06, 'epoch': 0.24} [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0339, 'learning_rate': 3.3e-06, 'epoch': 0.25} [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:08:13,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0392, 'learning_rate': 3.36e-06, 'epoch': 0.25} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0182, 'learning_rate': 3.4200000000000003e-06, 'epoch': 0.26} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0316, 'learning_rate': 3.48e-06, 'epoch': 0.26} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0394, 'learning_rate': 3.54e-06, 'epoch': 0.26} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0279, 'learning_rate': 3.6e-06, 'epoch': 0.27} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0329, 'learning_rate': 3.66e-06, 'epoch': 0.27} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0276, 'learning_rate': 3.72e-06, 'epoch': 0.28} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0289, 'learning_rate': 3.7800000000000002e-06, 'epoch': 0.28} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.032, 'learning_rate': 3.8400000000000005e-06, 'epoch': 0.29} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0258, 'learning_rate': 3.9e-06, 'epoch': 0.29} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0301, 'learning_rate': 3.96e-06, 'epoch': 0.3} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0225, 'learning_rate': 4.0200000000000005e-06, 'epoch': 0.3} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0345, 'learning_rate': 4.080000000000001e-06, 'epoch': 0.3} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 4.14e-06, 'epoch': 0.31} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.31} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0222, 'learning_rate': 4.26e-06, 'epoch': 0.32} 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [21:26<15:29:38, 25.66s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 72/2230 [28:26<15:10:57, 25.33s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0318, 'learning_rate': 4.3799999999999996e-06, 'epoch': 0.33} 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0178, 'learning_rate': 4.44e-06, 'epoch': 0.33} 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0304, 'learning_rate': 4.5e-06, 'epoch': 0.34} 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0237, 'learning_rate': 4.56e-06, 'epoch': 0.34} 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.024, 'learning_rate': 4.62e-06, 'epoch': 0.35} 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▌ | 73/2230 [28:51<15:02:04, 25.09s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:34,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:34,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:34,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:34,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:34,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0356, 'learning_rate': 4.68e-06, 'epoch': 0.35} 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 78/2230 [30:50<14:20:13, 23.98s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.025, 'learning_rate': 4.74e-06, 'epoch': 0.35} [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:18:59,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0274, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.36} [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:19:30,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:46,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:46,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0263, 'learning_rate': 4.86e-06, 'epoch': 0.36} [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:19:50,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0253, 'learning_rate': 4.92e-06, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:16,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:20:31,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:20:31,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:20:31,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:20:31,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▉ | 83/2230 [32:45<13:40:58, 22.94s/it] Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▉ | 83/2230 [32:45<13:40:58, 22.94s/it] Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0262, 'learning_rate': 4.980000000000001e-06, 'epoch': 0.37} 4%|██▉ | 83/2230 [32:45<13:40:58, 22.94s/it] Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:45,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:45,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0213, 'learning_rate': 5.04e-06, 'epoch': 0.38} [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:20:49,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▉ | 85/2230 [33:28<13:10:52, 22.12s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▉ | 85/2230 [33:28<13:10:52, 22.12s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:24,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 86/2230 [33:48<12:55:30, 21.70s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 86/2230 [33:48<12:55:30, 21.70s/it]g-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:45,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:45,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:45,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:21:45,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:53,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:53,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:53,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:59,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:59,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:59,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:59,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0269, 'learning_rate': 5.22e-06, 'epoch': 0.39} [WARNING|modeling_bart.py:1051] 2022-03-27 20:21:59,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:21,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:21,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:21,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:25,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:27,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:27,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:27,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:33,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:33,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:37,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:37,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:41,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:41,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:44,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:44,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:48,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:22:48,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:52,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:54,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:56,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:56,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:22:56,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 19:59:01,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███▏ | 90/2230 [35:06<11:42:51, 19.71s/it][WARNING|modeling_bart.py:1051] 2022-03-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███▏ | 90/2230 [35:06<11:42:51, 19.71s/it][WARNING|modeling_bart.py:1051] 2022-03-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:04,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:06,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:08,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:10,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:12,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:15,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:17,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:17,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:19,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:21,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:23,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:25,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:27,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:29,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:31,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:32,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:32,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:34,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:36,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:38,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:40,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:42,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:44,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:46,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:46,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:47,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:51,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:53,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:54,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:56,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:23:58,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:00,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:00,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:02,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:05,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:07,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:08,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:11,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:13,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:14,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:14,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:17,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:19,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:21,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:23,205 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:25,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:25,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:27,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:29,642 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:32,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:35,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:35,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:37,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:39,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:41,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:43,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:43,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:45,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:47,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:49,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:49,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:51,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:53,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:55,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:56,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:56,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:24:58,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:02,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:02,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:05,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:05,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:09,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:09,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:13,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:13,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:16,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:20,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:20,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:23,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:23,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:27,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:27,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0439, 'learning_rate': 6.0600000000000004e-06, 'epoch': 0.45} [WARNING|modeling_utils.py:388] 2022-03-27 20:25:31,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:34,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:34,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:38,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:38,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:42,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:42,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:45,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:49,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:49,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:52,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:52,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:52,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:56,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:56,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:25:59,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:03,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:03,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:06,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:06,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:10,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:13,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:13,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:20,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:24,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:24,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0405, 'learning_rate': 6.18e-06, 'epoch': 0.46} [WARNING|modeling_utils.py:388] 2022-03-27 20:26:27,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:27,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:31,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:34,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:34,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:37,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:37,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0312, 'learning_rate': 6.2399999999999995e-06, 'epoch': 0.47} [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0284, 'learning_rate': 6.3e-06, 'epoch': 0.47} [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:26:41,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0313, 'learning_rate': 6.36e-06, 'epoch': 0.48} 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0305, 'learning_rate': 6.42e-06, 'epoch': 0.48} 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0268, 'learning_rate': 6.48e-06, 'epoch': 0.48} 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▋ | 106/2230 [39:52<15:03:41, 25.53s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0263, 'learning_rate': 6.54e-06, 'epoch': 0.49} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0382, 'learning_rate': 6.6e-06, 'epoch': 0.49} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0348, 'learning_rate': 6.660000000000001e-06, 'epoch': 0.5} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0268, 'learning_rate': 6.72e-06, 'epoch': 0.5} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0309, 'learning_rate': 6.78e-06, 'epoch': 0.51} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0296, 'learning_rate': 6.840000000000001e-06, 'epoch': 0.51} 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [41:15<15:42:20, 26.66s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0267, 'learning_rate': 6.900000000000001e-06, 'epoch': 0.52} 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 115/2230 [43:52<15:23:30, 26.20s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0332, 'learning_rate': 6.96e-06, 'epoch': 0.52} 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0335, 'learning_rate': 7.0200000000000006e-06, 'epoch': 0.52} 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0348, 'learning_rate': 7.08e-06, 'epoch': 0.53} 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0309, 'learning_rate': 7.14e-06, 'epoch': 0.53} 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0235, 'learning_rate': 7.2e-06, 'epoch': 0.54} 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████ | 116/2230 [44:18<15:14:39, 25.96s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0192, 'learning_rate': 7.26e-06, 'epoch': 0.54} 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 121/2230 [46:24<14:49:19, 25.30s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.024, 'learning_rate': 7.32e-06, 'epoch': 0.55} 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0287, 'learning_rate': 7.3800000000000005e-06, 'epoch': 0.55} 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0191, 'learning_rate': 7.44e-06, 'epoch': 0.56} 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0272, 'learning_rate': 7.5e-06, 'epoch': 0.56} 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0321, 'learning_rate': 7.5600000000000005e-06, 'epoch': 0.57} 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 122/2230 [46:49<14:40:33, 25.06s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0225, 'learning_rate': 7.62e-06, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0352, 'learning_rate': 7.680000000000001e-06, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0323, 'learning_rate': 7.74e-06, 'epoch': 0.58} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0232, 'learning_rate': 7.8e-06, 'epoch': 0.58} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0259, 'learning_rate': 7.860000000000001e-06, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0175, 'learning_rate': 7.92e-06, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-27 20:36:36,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:38:45,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0266, 'learning_rate': 7.98e-06, 'epoch': 0.6} 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 133/2230 [51:07<13:21:23, 22.93s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:16,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:16,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0213, 'learning_rate': 8.040000000000001e-06, 'epoch': 0.6} [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:20,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:32,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:32,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:32,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:32,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:39:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:39:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:39:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0272, 'learning_rate': 8.1e-06, 'epoch': 0.61} [WARNING|modeling_bart.py:1051] 2022-03-27 20:39:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:39:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:50,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:50,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:50,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:50,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:39:50,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:00,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:00,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 136/2230 [52:11<12:38:55, 21.75s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 136/2230 [52:11<12:38:55, 21.75s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0193, 'learning_rate': 8.160000000000001e-06, 'epoch': 0.61} 6%|████▋ | 136/2230 [52:11<12:38:55, 21.75s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:11,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:11,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:11,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:11,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:11,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:21,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:21,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 137/2230 [52:31<12:24:48, 21.35s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 137/2230 [52:31<12:24:48, 21.35s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:27,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:27,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:27,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:34,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:45,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:45,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0233, 'learning_rate': 8.28e-06, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-27 20:40:45,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:52,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:52,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:52,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:40:58,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:00,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:00,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:04,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:04,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0135, 'learning_rate': 8.340000000000001e-06, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-27 20:41:08,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:08,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:12,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:12,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:16,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:18,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:21,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:21,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:21,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:25,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:41:25,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:28,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:30,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:33,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:35,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:37,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:39,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:39,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:41,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:43,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:45,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:47,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:49,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:51,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:53,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:55,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:55,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:57,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:41:59,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:01,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:02,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:06,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:08,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:08,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:11,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:13,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:15,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:17,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:18,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:20,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:22,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:22,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:26,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:27,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:29,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:32,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:33,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:36,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:36,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:38,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:39,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:42,205 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:44,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:46,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:46,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:47,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:49,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:52,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:54,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:55,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:55,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:42:58,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:00,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:02,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:04,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:04,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:06,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:08,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:10,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:12,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:12,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:14,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:16,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:19,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:19,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:19,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:22,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:22,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:26,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:26,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:29,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:29,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:33,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:37,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:37,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:40,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:40,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:44,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:44,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:47,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:47,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0357, 'learning_rate': 9.06e-06, 'epoch': 0.68} [WARNING|modeling_utils.py:388] 2022-03-27 20:43:51,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:55,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:55,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:58,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:43:58,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:02,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:02,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:05,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:09,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:09,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:12,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:12,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:16,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:16,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:19,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:19,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:23,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:23,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:26,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:30,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:30,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:33,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:33,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:37,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:40,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:40,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:44,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:44,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:44,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:47,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:47,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:51,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:54,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:54,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:58,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:44:58,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:01,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:05,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:05,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0281, 'learning_rate': 9.24e-06, 'epoch': 0.69} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0386, 'learning_rate': 9.3e-06, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0277, 'learning_rate': 9.36e-06, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0378, 'learning_rate': 9.42e-06, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0345, 'learning_rate': 9.48e-06, 'epoch': 0.71} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 9.54e-06, 'epoch': 0.71} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0255, 'learning_rate': 9.600000000000001e-06, 'epoch': 0.72} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0282, 'learning_rate': 9.66e-06, 'epoch': 0.72} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0302, 'learning_rate': 9.72e-06, 'epoch': 0.73} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0328, 'learning_rate': 9.780000000000001e-06, 'epoch': 0.73} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.035, 'learning_rate': 9.84e-06, 'epoch': 0.74} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0324, 'learning_rate': 9.9e-06, 'epoch': 0.74} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0242, 'learning_rate': 9.960000000000001e-06, 'epoch': 0.74} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0275, 'learning_rate': 1.002e-05, 'epoch': 0.75} [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:45:08,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0283, 'learning_rate': 1.008e-05, 'epoch': 0.75} [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:51:17,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0178, 'learning_rate': 1.0140000000000001e-05, 'epoch': 0.76} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0223, 'learning_rate': 1.02e-05, 'epoch': 0.76} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.022, 'learning_rate': 1.0260000000000002e-05, 'epoch': 0.77} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0219, 'learning_rate': 1.032e-05, 'epoch': 0.77} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0388, 'learning_rate': 1.0379999999999999e-05, 'epoch': 0.78} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0265, 'learning_rate': 1.044e-05, 'epoch': 0.78} 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▋ | 169/2230 [1:03:55<14:43:54, 25.73s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.024, 'learning_rate': 1.05e-05, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.027, 'learning_rate': 1.0559999999999999e-05, 'epoch': 0.79} [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:54:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.021, 'learning_rate': 1.068e-05, 'epoch': 0.8} 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0268, 'learning_rate': 1.074e-05, 'epoch': 0.8} 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|█████▉ | 177/2230 [1:07:12<13:54:24, 24.39s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0304, 'learning_rate': 1.08e-05, 'epoch': 0.81} 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 180/2230 [1:08:22<13:26:07, 23.59s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0324, 'learning_rate': 1.086e-05, 'epoch': 0.81} 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 181/2230 [1:08:45<13:17:51, 23.36s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0202, 'learning_rate': 1.092e-05, 'epoch': 0.82} [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0231, 'learning_rate': 1.098e-05, 'epoch': 0.82} [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:57:02,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0346, 'learning_rate': 1.104e-05, 'epoch': 0.83} [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:44,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:57:56,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 185/2230 [1:10:14<12:42:19, 22.37s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 185/2230 [1:10:14<12:42:19, 22.37s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0235, 'learning_rate': 1.11e-05, 'epoch': 0.83} 8%|██████▏ | 185/2230 [1:10:14<12:42:19, 22.37s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:15,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:15,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:15,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:15,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:22,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:25,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:25,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 186/2230 [1:10:35<12:27:25, 21.94s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 186/2230 [1:10:35<12:27:25, 21.94s/it]g-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0192, 'learning_rate': 1.116e-05, 'epoch': 0.83} [WARNING|modeling_utils.py:388] 2022-03-27 20:58:33,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:33,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:33,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:39,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:39,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:39,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:39,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:39,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.028, 'learning_rate': 1.1220000000000001e-05, 'epoch': 0.84} [WARNING|modeling_utils.py:388] 2022-03-27 20:58:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:58:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:58:58,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:58:58,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:58:58,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:04,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:04,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:04,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:10,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:10,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:10,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0266, 'learning_rate': 1.128e-05, 'epoch': 0.84} [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:10,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:18,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:18,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:22,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:22,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:22,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:28,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:28,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 189/2230 [1:11:36<11:47:45, 20.81s/it] Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:32,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:32,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:32,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:32,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:40,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:40,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:44,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:46,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:46,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:46,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0215, 'learning_rate': 1.1400000000000001e-05, 'epoch': 0.85} [WARNING|modeling_utils.py:388] 2022-03-27 20:59:52,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 20:59:52,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:56,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 20:59:59,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:01,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:01,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:05,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:07,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:07,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:09,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:11,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-27 21:00:11,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:15,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:17,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:19,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:21,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:23,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:23,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:25,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:27,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:29,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:31,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:33,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:35,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:37,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:39,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:39,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:41,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:43,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:44,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:48,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:49,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:49,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:54,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-27 21:00:54,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-27 20:23:00,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...